Skip to content

Download databases from GitHub#1229

Merged
shati-patel merged 8 commits intomainfrom
shati-patel/download-gh-db
Mar 25, 2022
Merged

Download databases from GitHub#1229
shati-patel merged 8 commits intomainfrom
shati-patel/download-gh-db

Conversation

@shati-patel
Copy link
Copy Markdown
Contributor

@shati-patel shati-patel commented Mar 22, 2022

🍐 with @robertbrignull

Will fix #1123.

Adds a new command to download a database directly from GitHub. This borrows heavily from existing functionality like the "Download database from LGTM" command 😅

download-db

The main difference is that we need to pass in credentials to authenticate to the GitHub API. We could use octokit directly for downloading the DB, but then we wouldn't be able to re-use all the nice functionality from databaseArchiveFetcher. So instead we get the octokit token (as previously done in #952) and pass in some custom API headers.

Still TODO:

(probably in a follow-up PR)

Checklist

N/A, since this is now also internal-only 🔐

  • CHANGELOG.md has been updated to incorporate all user visible changes made by this pull request.
  • Issues have been created for any UI or other user-facing changes made by this pull request.
  • [Maintainers only] If this pull request makes user-facing changes that require documentation changes, open a corresponding docs pull request in the github/codeql repo and add the ready-for-doc-review label there.

@shati-patel shati-patel marked this pull request as ready for review March 22, 2022 17:26
@shati-patel shati-patel requested review from a team as code owners March 22, 2022 17:26
Copy link
Copy Markdown
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good! Minor suggestions only. Would it be possible to add unit tests for the new command, based on any we have for the existing LGTM command? (EDIT: I see those are in progress, reviewed too quickly!)

Comment thread extensions/ql-vscode/src/databaseFetcher.ts Outdated
Comment thread extensions/ql-vscode/src/authentication.ts Outdated
Comment thread extensions/ql-vscode/src/databaseFetcher.ts
@adityasharad
Copy link
Copy Markdown
Contributor

adityasharad commented Mar 22, 2022

For logos, we currently use icons from both https://github.com/primer/octicons and https://github.com/microsoft/vscode-icons in this repo, and both have a GitHub icon. Give those a try :)

@shati-patel
Copy link
Copy Markdown
Contributor Author

Thanks for the helpful review! 😍 Addressed the comments, and made another small change to hide this command behind the canary flag.

Currently working on unit tests! 🧪 😅

For logos, we currently use icons from both https://github.com/primer/octicons and https://github.com/microsoft/vscode-icons in this repo, and both have a GitHub icon. Give those a try :)

Thank you! Found in https://github.com/microsoft/vscode-icons. Only thing is that some of the other logos have a small green "plus" ➕
image
I don't think it matters too much that we don't have that for the GitHub (and general "cloud") logos, especially since it's hidden behind canary.

@shati-patel
Copy link
Copy Markdown
Contributor Author

Added some tests in d7a1e42!
It's not ideal, since we can't easily make any real octokit requests, but I've tested with a few mock API responses instead. Re-reviews/suggestions welcome 🌅 🙏🏽

const [owner, repo] = githubRepo.split('/');

const octokit = await credentials.getOctokit();
const response = await octokit.request('GET /repos/:owner/:repo/code-scanning/codeql/databases', { owner, repo });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we're ok here, but can you make sure that we can handle misplaced cApS? Remember the boinc/BOINC repo that gave us trouble for LGTM?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, does this support anonymous downloading of public repos?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API currently does not allow anonymous requests because it is only staff shipped. Even in the private beta it will still require logging in. Once we get to public beta it will likely start accepting anonymous requests for public repos.

However for the extension I'd argue we still want to always make the user log in. It'll simplify our lives because we only have to handle one case. And I would expect the requirement to log in is not a blocked to users here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. I think that's fine for now. It does seem like a small barrier for users who are not using any other github features. This is something we can discuss later when the API endpoint starts accepting anonymous requests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure we're ok here, but can you make sure that we can handle misplaced cApS? Remember the boinc/BOINC repo that gave us trouble for LGTM?

Good point, I hadn't thought of that 💪🏽 From a bit of manual testing, it looks like the API is case-insensitive (e.g. GitHub/CodeQL returns the same as github/codeql). So I think we're okay here!

Also, does this support anonymous downloading of public repos?

Not yet. The entire MRVA API is still staff-only, so we need to pass in credentials for now 🔒

});
if (!languages.length) {
return;
throw new Error('No databases found');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a change to downloading LGTM databases. I can't remember if there was a reason why we didn't throw when there were no languages found for an LGTM project. Can you just make sure that nothing awkward happens when downloading LGTM dbs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find an LGTM project that doesn't have any languages, so I suspect that's a sufficiently rare case that we've never needed to handle it previously 😅

* - `repo` is made up of alphanumeric characters, hyphens, or underscores
*/
const REPO_REGEX = /^(?:[a-zA-Z0-9]+-)*[a-zA-Z0-9]+\/[a-zA-Z0-9-_]+$/;
export const REPO_REGEX = /^(?:[a-zA-Z0-9]+-)*[a-zA-Z0-9]+\/[a-zA-Z0-9-_]+$/;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to move this to a new file. I don't think databaseFetcher should be depending on run-remote-query. Maybe helpers or helpers-pure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! ✔️

Copy link
Copy Markdown
Contributor

@adityasharad adityasharad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. I think you've addressed my comments, and I'll leave @aeisenberg to approve when he's happy.

@@ -0,0 +1,3 @@
<svg width="16" height="16" viewBox="0 0 16 16" fill="none" xmlns="http://www.w3.org/2000/svg">
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider an HTML comment (if SVG allows them) or a README file stating where you got these icons from.

}

const databaseUrl = await convertGithubNwoToDatabaseUrl(githubRepo, credentials, progress);
if (databaseUrl) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: You can do the same early-exit trick here:

if (!databaseUrl) {
  return
}
// continue

* }
* We only need the actual token string.
*/
const octokitToken = await octokit.auth() as { token: string };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Could this be (await octokit.auth())?.token, and then you don't need to use .token everywhere below?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense! I still had to keep as { token: string }, since await octokit.auth() is of type "unknown" otherwise 🤷🏽

await commands.executeCommand('codeQLDatabases.focus');
void showAndLogInformationMessage('Database downloaded and imported successfully.');
}
return item;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Move this up into the if(item) case, so it's more obvious what's going on. The return at the end of the function will handle all the fall-through cases where we didn't return early.

Copy link
Copy Markdown
Contributor

@aeisenberg aeisenberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into all of my comments. I think you've addressed Aditya's as well. So... 🚢.

@shati-patel shati-patel merged commit 90d636a into main Mar 25, 2022
@shati-patel shati-patel deleted the shati-patel/download-gh-db branch March 25, 2022 15:24
@shati-patel shati-patel mentioned this pull request Mar 28, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support importing databases directly from github

4 participants